Release 10.1A: OpenEdge Data Management:
SQL Development


Optimizer phases

The optimization process is divided into several phases. Some phases deal with internal infrastructure, such as minimizing data handling or temp-table usage. Others deal with significant cost factors and are straightforward to understand. Each phase addresses a specific type of optimization:

The optimizer follows a cost-based model. In each stage, whenever multiple alternatives are available, the optimizer estimates the cost for each and selects the cheapest. The cost computation takes into account:

The following sections provide details on the optimization phases.

Pushing restrict operations close to the data origin

This stage consists of moving restrict operators as far down the query tree as possible. This reduces the number of tuples moving up the tree for further processing and minimizes the amount of data handled. When restrict operations on a join node cannot be moved below the join node, they are set as join conditions. When multiple predicates are moved down the tree to the same relative position, they are reassembled into a single restrict operation, as shown in Example 10–1.

Example 10–1: Query statement optimization

SELECT Name FROM Employee

WHERE Salary > 4000 AND Salary <= 6000

AND Employee.DeptNum = Department.DeptNum;

The optimizer takes the input tree and transforms it as shown below. The restrictions Salary > 4000 and Salary <= 6000 are moved down the tree, below the join node, since they apply to a single table. The restriction Employee.DeptNum = Department.DeptNum, applying to two tables, stays above the join node.

Using indexes for restrictions

This optimization phase consists of recognizing those cases where an existing index can be used to evaluate a restriction and converting a table scan into an index bracket or set of contiguous index entries. An index bracket is extremely effective in limiting the number of rows that must be processed.

Choosing the best index

To choose an index, the optimizer performs several stages. These stages determine whether an index can be used to process a restrict operation and, if there are multiple indexes to choose from, which index will be used:

Predicate expressions

When the predicates of an SQL statement use the OR logical operator to combine expressions that compare the same column with a constant, the optimizer converts these expressions to a single IN predicate. The purpose of these transformations is, where possible, to combine multiple predicates into a single predicate for simpler evaluation in this and later stages.

Similarly, a LIKE predicate on an index key, where the LIKE pattern has a prefix of fixed characters, is converted to a BETWEEN predicate.

Generating candidate indexes

For every predicate in an SQL statement, the optimizer checks to see if there are indexes that include the columns referenced in the predicate.

Once the optimizer knows which indexes exist on the relevant tables, it generates a list of all the possible index predicates that could be used. For each predicate for which there is an index, the optimizer checks whether:

Selecting an index

When the list of candidate index predicates has been determined, the optimizer selects which, if any, it will use for an index scan operation.

This selection is cost-based. The optimizer computes the cost for each of the index candidates and the cost for a table scan using the default index. The candidate with the lowest cost is chosen.

Providing index hints

You can specify an index for each table in the FROM clause of a SELECT query.

SELECT column_list 
FROM table_list 
WHERE table_name [ [ AS ] table_alias ] 
[ WITH (INDEX ( index_val ),...)] 

index_val is a string that indicates the name of the index.

If a candidate plan is generated with the specified index, the optimizer will use it. If the optimizer is unable to generate a candidate plan with the specified index, it ignores the hint.

Join optimization

There are two distinct optimization tasks done as part of the join optimization stage:

Determining join order among adjacent join nodes

After identifying a set of adjacent join nodes, the optimizer uses the available statistics to estimate the cardinality (the number of rows in the table or intermediate result) and selectivity (percentage of rows a predicate returns) for each subtree of the join nodes. It then uses the following criteria to determine the join order:

Choosing the join algorithm

Once the join order has been established, each join node is analyzed to select from among the following algorithms:

The optimizer generates, when possible, candidates for each algorithm. For each join node, candidates are generated by:

Once a set of candidates exists, the optimizer selects the least costly candidate.

Augmented nested loop join

The augmented nested loop (ANL) is by far the most common join method. An augmented nested loop join is performed by doing a scan over the left subtree and for each row in it, performing an index bracket scan on a portion of the right subtree. The right subtree is read as many times as there are rows in the left subtree.

To be a candidate for an ANL join, the subtree pair for a join node must meet the following criteria:

When an ANL join is possible on several indexes, the least-cost index is chosen.

When there is an index defined on the left subtree’s table instead of on the right, the optimizer analyzes the cost of swapping the subtrees to make an ANL join possible.

When neither subtree’s table has an index defined on the join column, the optimizer analyzes the cost of creating a dynamic index on one or both of the subtrees.

Merge join

A merge join is performed by opening simultaneous scans on both the left and right subtrees. Each row that satisfies the join condition is output by the join algorithm. Depending upon the result of the join column comparison, either the left or right scan pointer is advanced. The left and right subtrees are each read once. A merge join is almost never chosen because its cost invariably exceeds an ANL join.

Nested loop join

A nested loop join is performed by doing a scan over the left subtree and for each row in it performing a full scan of the right subtree.

This is the default join algorithm, which can be used for any join. However, it is usually less efficient than the other methods. Usually, either an existing index or a dynamic index, used in an ANL join, will cost much less. Occasionally, when subtree cardinalities are very low, possibly because of index bracketing, nested loop will be the method with the least cost.

Sort optimization

The optimizer performs two optimizations designed to avoid sort operations. The first optimization is to eliminate redundant sorts. The second optimization is to convert table scans into index bracket scans.

Eliminating redundant sorts

The optimizer checks whether the query tree contains unnecessary sort nodes. For example, when an SQL statement contains both a GROUP BY clause and an ORDER BY clause that refers to the same column, at most one sort is needed.

A sort node is also redundant when the immediate descendant node of the sort node is an index bracket scan on the sort column. That is, the sort is redundant when the data input to the sort was read using an index with the needed sort order.

Converting table scans to index bracket scans

When a leaf node of a subtree is a table scan, the optimizer checks whether any indexes that exist on the table match the sort columns. If so, it analyzes the cost of each possible index bracket scan and compares the least of those with the sum of the cost of the table scan and sort operation.

If the analysis shows an index bracket scan as having less cost than the table scan and sort operation, the optimizer converts the table scan to the index bracket scan and removes the sort node.

Indexes to evaluate MAX/MIN functions

This stage of optimization examines subtrees that contain MIN and MAX aggregate functions. The optimizer checks if any index on the table matches the column specified in the function. If so, it replaces the table scan at the leaf node with an index bracket scan.

The index bracket scan looks up the first or last value of the relevant index key. The first and last values represent the MIN and MAX values, respectively, for ascending indexes, and the MAX and MIN values for descending indexes.

Evaluating aggregate functions without fetching the table rows is not possible for indexed character columns because index entries for character data contain the “sort-weight” form of the column value, not the actual column value.

Index bracket scan optimization

This stage checks whether a table scan can be replaced by an index bracket scan. This is possible when a subtree meets the following criteria:


Copyright © 2005 Progress Software Corporation
www.progress.com
Voice: (781) 280-4000
Fax: (781) 280-4095